Author:Yan, Y., Sheng, G., Chen, Y., Jiang, X., Guo, Z. H., & Qin, S. P.
Abstract
Data cleaning is a key step in data preprocessing for state assessment of power equipment to help improve data quality and utilization. As the device status information can be made equivalent to the multivariate time sequence of each state, an iterative data cleaning method based on time sequence analysis is proposed. First, the abnormal data in time sequence is classified with the missing values treated as one of the types of the anomalies. Then the impact of different types of anomalies on the sequential model is quantified and several implementation steps of the iterative method are described. Finally, the approach is tested on the on-line monitoring data of a power equipment of the China Southern power grid. The results show that this method is capable of not only effectively identifying the abnormal data, but also repairing the noise points and missing values in meeting the data cleaning requirement.
Keywords:big data; data cleaning; time sequence; state data of power equipment